[jira] [Issue Comment Deleted] (SPARK-29068) CSV read reports incorrect row count
[ https://issues.apache.org/jira/browse/SPARK-29068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HondaWei updated SPARK-29068: - Comment: was deleted (was: [~tdiesler], May you provide your test CSV data? thanks!) > CSV read reports incorrect row count > > > Key: SPARK-29068 > URL: https://issues.apache.org/jira/browse/SPARK-29068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Thomas Diesler >Priority: Major > > Reading the [SFNY example > data|https://github.com/jadeyee/r2d3-part-1-data/blob/master/part_1_data.csv] > in Java like this ... > {code:java} > Path srcdir = Paths.get("src/test/resources"); > Path inpath = srcdir.resolve("part_1_data.csv"); > SparkSession session = getOrCreateSession(); > Dataset dataset = session.read() > //.option("header", true) > .option("mode", "DROPMALFORMED") > .schema(new StructType() > .add("insf", IntegerType, false) > .add("beds", DoubleType, false) > .add("baths", DoubleType, false) > .add("price", IntegerType, false) > .add("year", IntegerType, false) > .add("sqft", IntegerType, false) > .add("prcsqft", IntegerType, false) > .add("elevation", IntegerType, false)) > .csv(inpath.toString()); > {code} > Incorrectly reports 495 instead of 492 rows. It seems to include the three > header rows in the count. > Also, without DROPMALFORMED it creates 495 rows with three null value rows. > This also seems to be incorrect because the schema explicitly requires non > null values for all fields. > This code works fine with Spark-2.1.0 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29067) divide function does not throw an error, if a number is not passed to it
[ https://issues.apache.org/jira/browse/SPARK-29067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928688#comment-16928688 ] HondaWei commented on SPARK-29067: -- I'll try to figure out this issue. > divide function does not throw an error, if a number is not passed to it > > > Key: SPARK-29067 > URL: https://issues.apache.org/jira/browse/SPARK-29067 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.4.3 > Environment: Apache Spark Java : 2.4.3 >Reporter: Mangesh Rananavare >Priority: Major > > Ex. > dataset.where(col("col1").divide("col2")).$greater(10)).show(); > If you see closely I forgot to wrap the divide parameter "col2" into col() > function, so basically I pass a String. This should give me a > NumberFormatException!, but instead the where clause resolves to null and I > get a empty dataset as a result of above computation!! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29068) CSV read reports incorrect row count
[ https://issues.apache.org/jira/browse/SPARK-29068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928686#comment-16928686 ] HondaWei commented on SPARK-29068: -- [~tdiesler], May you provide your test CSV data? thanks! > CSV read reports incorrect row count > > > Key: SPARK-29068 > URL: https://issues.apache.org/jira/browse/SPARK-29068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Thomas Diesler >Priority: Major > > Reading the [SFNY example > data|https://github.com/jadeyee/r2d3-part-1-data/blob/master/part_1_data.csv] > in Java like this ... > {code:java} > Path srcdir = Paths.get("src/test/resources"); > Path inpath = srcdir.resolve("part_1_data.csv"); > SparkSession session = getOrCreateSession(); > Dataset dataset = session.read() > //.option("header", true) > .option("mode", "DROPMALFORMED") > .schema(new StructType() > .add("insf", IntegerType, false) > .add("beds", DoubleType, false) > .add("baths", DoubleType, false) > .add("price", IntegerType, false) > .add("year", IntegerType, false) > .add("sqft", IntegerType, false) > .add("prcsqft", IntegerType, false) > .add("elevation", IntegerType, false)) > .csv(inpath.toString()); > {code} > Incorrectly reports 495 instead of 492 rows. It seems to include the three > header rows in the count. > Also, without DROPMALFORMED it creates 495 rows with three null value rows. > This also seems to be incorrect because the schema explicitly requires non > null values for all fields. > This code works fine with Spark-2.1.0 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24077) Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`
[ https://issues.apache.org/jira/browse/SPARK-24077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879858#comment-16879858 ] HondaWei commented on SPARK-24077: -- Hi [~hyukjin.kwon] Thank you! I am going to trace the code and modify it in the near term if [~benedict jin] doesn't work on it. > Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS` > -- > > Key: SPARK-24077 > URL: https://issues.apache.org/jira/browse/SPARK-24077 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Benedict Jin >Priority: Major > Labels: starter > > The error message of {{CREATE TEMPORARY FUNCTION IF NOT EXISTS}} looks > confusing: > {code} > scala> > org.apache.spark.sql.SparkSession.builder().enableHiveSupport.getOrCreate.sql("CREATE > TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as > 'org.apache.spark.sql.hive.udf.YuZhouWan'") > {code} > {code} > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'NOT' expecting \{'.', 'AS'}(line 1, pos 29) > == SQL == > CREATE TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as > 'org.apache.spark.sql.hive.udf.YuZhouWan' > -^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24077) Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`
[ https://issues.apache.org/jira/browse/SPARK-24077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879615#comment-16879615 ] HondaWei commented on SPARK-24077: -- I would like to contribute Spark. May someone assign this issue to me for my first contribution in Spark. Thanks:) > Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS` > -- > > Key: SPARK-24077 > URL: https://issues.apache.org/jira/browse/SPARK-24077 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Benedict Jin >Priority: Major > Labels: starter > > The error message of {{CREATE TEMPORARY FUNCTION IF NOT EXISTS}} looks > confusing: > {code} > scala> > org.apache.spark.sql.SparkSession.builder().enableHiveSupport.getOrCreate.sql("CREATE > TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as > 'org.apache.spark.sql.hive.udf.YuZhouWan'") > {code} > {code} > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'NOT' expecting \{'.', 'AS'}(line 1, pos 29) > == SQL == > CREATE TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as > 'org.apache.spark.sql.hive.udf.YuZhouWan' > -^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org