[jira] [Commented] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255515#comment-17255515
 ] 

Liu Neng commented on SPARK-33883:
--

!image-2020-12-28-18-24-18-395.png!

the first where is a table alias,

you can try 'select where.* from person where where name is not null'.

you can set spark.sql.ansi.enabled=true to raise an exception in this case.

!image-2020-12-28-18-32-25-960.png!

so I think it is not an issue. 

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png, 
> image-2020-12-28-18-32-25-960.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread Liu Neng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33883:
-
Attachment: image-2020-12-28-18-32-25-960.png

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png, 
> image-2020-12-28-18-32-25-960.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread Liu Neng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33883:
-
Attachment: image-2020-12-28-18-24-18-395.png

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33632) to_date doesn't behave as documented

2020-12-03 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243681#comment-17243681
 ] 

Liu Neng edited comment on SPARK-33632 at 12/4/20, 3:46 AM:


This is not an issue, you may misunderstand the docs.

You should use pattern m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!


was (Author: qwe1398775315):
you should use pattern m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33632) to_date doesn't behave as documented

2020-12-03 Thread Liu Neng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33632:
-
Attachment: image-2020-12-04-11-45-10-379.png

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented

2020-12-03 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243681#comment-17243681
 ] 

Liu Neng commented on SPARK-33632:
--

you should use pattern m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225141#comment-17225141
 ] 

Liu Neng commented on SPARK-33317:
--

I run these sql on spark 3.0.0, condition 1 +(between ' 1000405134' and 
'1000772585')+ find 6012 records, condition 2 ++(between '1000405134' and 
'1000772585'++) find 2798 records.

I find that comparator in codegen is UTF8String

!image-2020-11-03-13-30-12-049.png!

" 1000405134"  is smaller than "1000405134" 

I think that it isn't an issue, because comparing value is String not Number. 

I tried to analyze the parse tree, "1000405134"  is a String literal.

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Liu Neng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33317:
-
Attachment: image-2020-11-03-13-30-12-049.png

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org