[jira] [Commented] (SPARK-33883) Can repeat "where" twice without error in spark sql
[ https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255515#comment-17255515 ] Liu Neng commented on SPARK-33883: -- !image-2020-12-28-18-24-18-395.png! the first where is a table alias, you can try 'select where.* from person where where name is not null'. you can set spark.sql.ansi.enabled=true to raise an exception in this case. !image-2020-12-28-18-32-25-960.png! so I think it is not an issue. > Can repeat "where" twice without error in spark sql > --- > > Key: SPARK-33883 > URL: https://issues.apache.org/jira/browse/SPARK-33883 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Stu >Priority: Minor > Attachments: image-2020-12-28-18-24-18-395.png, > image-2020-12-28-18-32-25-960.png > > > the following sql code works, despite having bad syntax ("where" is mentioned > twice): > {code:java} > select * from table > where where field is not null{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33883) Can repeat "where" twice without error in spark sql
[ https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Neng updated SPARK-33883: - Attachment: image-2020-12-28-18-32-25-960.png > Can repeat "where" twice without error in spark sql > --- > > Key: SPARK-33883 > URL: https://issues.apache.org/jira/browse/SPARK-33883 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Stu >Priority: Minor > Attachments: image-2020-12-28-18-24-18-395.png, > image-2020-12-28-18-32-25-960.png > > > the following sql code works, despite having bad syntax ("where" is mentioned > twice): > {code:java} > select * from table > where where field is not null{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33883) Can repeat "where" twice without error in spark sql
[ https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Neng updated SPARK-33883: - Attachment: image-2020-12-28-18-24-18-395.png > Can repeat "where" twice without error in spark sql > --- > > Key: SPARK-33883 > URL: https://issues.apache.org/jira/browse/SPARK-33883 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Stu >Priority: Minor > Attachments: image-2020-12-28-18-24-18-395.png > > > the following sql code works, despite having bad syntax ("where" is mentioned > twice): > {code:java} > select * from table > where where field is not null{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33632) to_date doesn't behave as documented
[ https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243681#comment-17243681 ] Liu Neng edited comment on SPARK-33632 at 12/4/20, 3:46 AM: This is not an issue, you may misunderstand the docs. You should use pattern m/d/yy, parse mode is determined by count of letter 'y'. below is source code from DateTimeFormatterBuilder. !image-2020-12-04-11-45-10-379.png! was (Author: qwe1398775315): you should use pattern m/d/yy, parse mode is determined by count of letter 'y'. below is source code from DateTimeFormatterBuilder. !image-2020-12-04-11-45-10-379.png! > to_date doesn't behave as documented > > > Key: SPARK-33632 > URL: https://issues.apache.org/jira/browse/SPARK-33632 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Frank Oosterhuis >Priority: Major > Attachments: image-2020-12-04-11-45-10-379.png > > > I'm trying to use to_date on a string formatted as "10/31/20". > Expected output is "2020-10-31". > Actual output is "0020-01-31". > The > [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html] > suggests 2020 or 20 as input for "y". > Example below. Expected behaviour is included in the udf. > {code:scala} > import java.sql.Date > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.functions.{to_date, udf} > object ToDate { > val toDate = udf((date: String) => { > val split = date.split("/") > val month = "%02d".format(split(0).toInt) > val day = "%02d".format(split(1).toInt) > val year = split(2).toInt + 2000 > Date.valueOf(s"${year}-${month}-${day}") > }) > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder().master("local[2]").getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > import spark.implicits._ > Seq("1/1/20", "10/31/20") > .toDF("raw") > .withColumn("to_date", to_date($"raw", "m/d/y")) > .withColumn("udf", toDate($"raw")) > .show > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33632) to_date doesn't behave as documented
[ https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Neng updated SPARK-33632: - Attachment: image-2020-12-04-11-45-10-379.png > to_date doesn't behave as documented > > > Key: SPARK-33632 > URL: https://issues.apache.org/jira/browse/SPARK-33632 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Frank Oosterhuis >Priority: Major > Attachments: image-2020-12-04-11-45-10-379.png > > > I'm trying to use to_date on a string formatted as "10/31/20". > Expected output is "2020-10-31". > Actual output is "0020-01-31". > The > [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html] > suggests 2020 or 20 as input for "y". > Example below. Expected behaviour is included in the udf. > {code:scala} > import java.sql.Date > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.functions.{to_date, udf} > object ToDate { > val toDate = udf((date: String) => { > val split = date.split("/") > val month = "%02d".format(split(0).toInt) > val day = "%02d".format(split(1).toInt) > val year = split(2).toInt + 2000 > Date.valueOf(s"${year}-${month}-${day}") > }) > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder().master("local[2]").getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > import spark.implicits._ > Seq("1/1/20", "10/31/20") > .toDF("raw") > .withColumn("to_date", to_date($"raw", "m/d/y")) > .withColumn("udf", toDate($"raw")) > .show > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented
[ https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243681#comment-17243681 ] Liu Neng commented on SPARK-33632: -- you should use pattern m/d/yy, parse mode is determined by count of letter 'y'. below is source code from DateTimeFormatterBuilder. !image-2020-12-04-11-45-10-379.png! > to_date doesn't behave as documented > > > Key: SPARK-33632 > URL: https://issues.apache.org/jira/browse/SPARK-33632 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Frank Oosterhuis >Priority: Major > Attachments: image-2020-12-04-11-45-10-379.png > > > I'm trying to use to_date on a string formatted as "10/31/20". > Expected output is "2020-10-31". > Actual output is "0020-01-31". > The > [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html] > suggests 2020 or 20 as input for "y". > Example below. Expected behaviour is included in the udf. > {code:scala} > import java.sql.Date > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.functions.{to_date, udf} > object ToDate { > val toDate = udf((date: String) => { > val split = date.split("/") > val month = "%02d".format(split(0).toInt) > val day = "%02d".format(split(1).toInt) > val year = split(2).toInt + 2000 > Date.valueOf(s"${year}-${month}-${day}") > }) > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder().master("local[2]").getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > import spark.implicits._ > Seq("1/1/20", "10/31/20") > .toDF("raw") > .withColumn("to_date", to_date($"raw", "m/d/y")) > .withColumn("udf", toDate($"raw")) > .show > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225141#comment-17225141 ] Liu Neng commented on SPARK-33317: -- I run these sql on spark 3.0.0, condition 1 +(between ' 1000405134' and '1000772585')+ find 6012 records, condition 2 ++(between '1000405134' and '1000772585'++) find 2798 records. I find that comparator in codegen is UTF8String !image-2020-11-03-13-30-12-049.png! " 1000405134" is smaller than "1000405134" I think that it isn't an issue, because comparing value is String not Number. I tried to analyze the parse tree, "1000405134" is a String literal. > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Neng updated SPARK-33317: - Attachment: image-2020-11-03-13-30-12-049.png > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org