[jira] [Comment Edited] (SPARK-37344) split function behave differently between spark 2.3 and spark 3.2

2021-11-16 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444225#comment-17444225
 ] 

angerszhu edited comment on SPARK-37344 at 11/16/21, 8:06 AM:
--

for same SQL  
{code}
explain extended select split('dawdawdawd',';');

{code}
In hive 1.2
{code}
OK
ABSTRACT SYNTAX TREE:

TOK_QUERY
   TOK_INSERT
  TOK_DESTINATION
 TOK_DIR
TOK_TMP_FILE
  TOK_SELECT
 TOK_SELEXPR
TOK_FUNCTION
   split
   'dawdawdawd'
   '\\\;'
{code}

In hive 3
{code}
OK
ABSTRACT SYNTAX TREE:

TOK_QUERY
   TOK_INSERT
  TOK_DESTINATION
 TOK_DIR
TOK_TMP_FILE
  TOK_SELECT
 TOK_SELEXPR
TOK_FUNCTION
   split
   'dawdawdawd'
   ';'
{code}

So it should be caused by hive's code.


was (Author: angerszhuuu):
for same SQL  
{code}
explain extended select split('dawdawdawd',';');

{code}
In hive 1.2
{code}
OK
ABSTRACT SYNTAX TREE:

TOK_QUERY
   TOK_INSERT
  TOK_DESTINATION
 TOK_DIR
TOK_TMP_FILE
  TOK_SELECT
 TOK_SELEXPR
TOK_FUNCTION
   split
   'dawdawdawd'
   '\\\;'
{code}

In hive 3
{code}
OK
ABSTRACT SYNTAX TREE:

TOK_QUERY
   TOK_INSERT
  TOK_DESTINATION
 TOK_DIR
TOK_TMP_FILE
  TOK_SELECT
 TOK_SELEXPR
TOK_FUNCTION
   split
   'dawdawdawd'
   ';'
{code}

> split function behave differently between spark 2.3 and spark 3.2
> -
>
> Key: SPARK-37344
> URL: https://issues.apache.org/jira/browse/SPARK-37344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: ocean
>Priority: Major
>  Labels: incorrect
>
> while use split function in sql, it behave differently between 2.3 and 3.2, 
> which cause incorrect problem.
> we can use this sql to reproduce this problem:
>  
> create table split_test ( id int,name string)
> insert into split_test values(1,"abc;def")
> explain extended select split(name,';') from split_test
>  
> spark3:
> spark-sql> Explain extended select split(name,';') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \\;), None)]
> +- 'UnresolvedRelation [split_test], [], false
>  
> spark2:
>  
> spark-sql> Explain extended select split(name,';') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \;), None)]
> +- 'UnresolvedRelation split_test
>  
> It looks like the deal of escape is different



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37344) split function behave differently between spark 2.3 and spark 3.2

2021-11-16 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444225#comment-17444225
 ] 

angerszhu edited comment on SPARK-37344 at 11/16/21, 8:05 AM:
--

for same SQL  
{code}
explain extended select split('dawdawdawd',';');

{code}
In hive 1.2
{code}
OK
ABSTRACT SYNTAX TREE:

TOK_QUERY
   TOK_INSERT
  TOK_DESTINATION
 TOK_DIR
TOK_TMP_FILE
  TOK_SELECT
 TOK_SELEXPR
TOK_FUNCTION
   split
   'dawdawdawd'
   '\\\;'
{code}

In hive 3
{code}
OK
ABSTRACT SYNTAX TREE:

TOK_QUERY
   TOK_INSERT
  TOK_DESTINATION
 TOK_DIR
TOK_TMP_FILE
  TOK_SELECT
 TOK_SELEXPR
TOK_FUNCTION
   split
   'dawdawdawd'
   ';'
{code}


was (Author: angerszhuuu):
In latest master branch 
{code}
== Parsed Logical Plan ==
'Project [unresolvedalias('split('name, \;), None)]
+- 'UnresolvedRelation [split_test], [], false

== Analyzed Logical Plan ==
split(name, \;, -1): array
Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- SubqueryAlias spark_catalog.default.split_test
   +- Relation default.split_test[id#224,name#225] parquet

== Optimized Logical Plan ==
Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- Relation default.split_test[id#224,name#225] parquet

== Physical Plan ==
*(1) Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- *(1) ColumnarToRow
   +- FileScan parquet default.split_test[name#225] Batched: true, DataFilters: 
[], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/Users/yi.zhu/Documents/project/Angerszh/spark/sql/core/spark...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
{code}


> split function behave differently between spark 2.3 and spark 3.2
> -
>
> Key: SPARK-37344
> URL: https://issues.apache.org/jira/browse/SPARK-37344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: ocean
>Priority: Major
>  Labels: incorrect
>
> while use split function in sql, it behave differently between 2.3 and 3.2, 
> which cause incorrect problem.
> we can use this sql to reproduce this problem:
>  
> create table split_test ( id int,name string)
> insert into split_test values(1,"abc;def")
> explain extended select split(name,';') from split_test
>  
> spark3:
> spark-sql> Explain extended select split(name,';') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \\;), None)]
> +- 'UnresolvedRelation [split_test], [], false
>  
> spark2:
>  
> spark-sql> Explain extended select split(name,';') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \;), None)]
> +- 'UnresolvedRelation split_test
>  
> It looks like the deal of escape is different



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37344) split function behave differently between spark 2.3 and spark 3.2

2021-11-15 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444225#comment-17444225
 ] 

angerszhu edited comment on SPARK-37344 at 11/16/21, 2:51 AM:
--

In latest master branch 
{code}
== Parsed Logical Plan ==
'Project [unresolvedalias('split('name, \;), None)]
+- 'UnresolvedRelation [split_test], [], false

== Analyzed Logical Plan ==
split(name, \;, -1): array
Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- SubqueryAlias spark_catalog.default.split_test
   +- Relation default.split_test[id#224,name#225] parquet

== Optimized Logical Plan ==
Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- Relation default.split_test[id#224,name#225] parquet

== Physical Plan ==
*(1) Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- *(1) ColumnarToRow
   +- FileScan parquet default.split_test[name#225] Batched: true, DataFilters: 
[], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/Users/yi.zhu/Documents/project/Angerszh/spark/sql/core/spark...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
{code}



was (Author: angerszhuuu):
Work on this


> split function behave differently between spark 2.3 and spark 3.2
> -
>
> Key: SPARK-37344
> URL: https://issues.apache.org/jira/browse/SPARK-37344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: ocean
>Priority: Major
>  Labels: incorrect
>
> while use split function in sql, it behave differently between 2.3 and 3.2, 
> which cause incorrect problem.
> we can use this sql to reproduce this problem:
>  
> create table split_test ( id int,name string)
> insert into split_test values(1,"abc;def")
> explain extended select split(name,';') from split_test
>  
> spark3:
> spark-sql> Explain extended select split(name,';') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \\;), None)]
> +- 'UnresolvedRelation [split_test], [], false
>  
> spark2:
>  
> spark-sql> Explain extended select split(name,';') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \;), None)]
> +- 'UnresolvedRelation split_test
>  
> It looks like the deal of escape is different



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org