[jira] [Issue Comment Deleted] (SPARK-29068) CSV read reports incorrect row count

2019-09-12 Thread HondaWei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HondaWei updated SPARK-29068:
-
Comment: was deleted

(was: [~tdiesler], May you provide your test CSV data? 

thanks!)

> CSV read reports incorrect row count
> 
>
> Key: SPARK-29068
> URL: https://issues.apache.org/jira/browse/SPARK-29068
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Thomas Diesler
>Priority: Major
>
> Reading the [SFNY example 
> data|https://github.com/jadeyee/r2d3-part-1-data/blob/master/part_1_data.csv] 
> in Java like this ...
> {code:java}
> Path srcdir = Paths.get("src/test/resources");
> Path inpath = srcdir.resolve("part_1_data.csv");
> SparkSession session = getOrCreateSession();
> Dataset dataset = session.read()
>   //.option("header", true)
> .option("mode", "DROPMALFORMED")
>   .schema(new StructType()
>   .add("insf", IntegerType, false)
>   .add("beds", DoubleType, false)
>   .add("baths", DoubleType, false)
>   .add("price", IntegerType, false)
>   .add("year", IntegerType, false)
>   .add("sqft", IntegerType, false)
>   .add("prcsqft", IntegerType, false)
>   .add("elevation", IntegerType, false))
>   .csv(inpath.toString());
> {code}
> Incorrectly reports 495 instead of 492 rows. It seems to include the three 
> header rows in the count.
> Also, without DROPMALFORMED it creates 495 rows with three null value rows. 
> This also seems to be incorrect because the schema explicitly requires non 
> null values for all fields.
> This code works fine with Spark-2.1.0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29067) divide function does not throw an error, if a number is not passed to it

2019-09-12 Thread HondaWei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928688#comment-16928688
 ] 

HondaWei commented on SPARK-29067:
--

I'll try to figure out this issue.

> divide function does not throw an error, if a number is not passed to it
> 
>
> Key: SPARK-29067
> URL: https://issues.apache.org/jira/browse/SPARK-29067
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.4.3
> Environment: Apache Spark Java : 2.4.3
>Reporter: Mangesh Rananavare
>Priority: Major
>
> Ex. 
> dataset.where(col("col1").divide("col2")).$greater(10)).show();
> If you see closely I forgot to wrap the divide parameter "col2" into col() 
> function, so basically I pass a String. This should give me a 
> NumberFormatException!, but instead the where clause resolves to null and I 
> get a empty dataset as a result of above computation!!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29068) CSV read reports incorrect row count

2019-09-12 Thread HondaWei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928686#comment-16928686
 ] 

HondaWei commented on SPARK-29068:
--

[~tdiesler], May you provide your test CSV data? 

thanks!

> CSV read reports incorrect row count
> 
>
> Key: SPARK-29068
> URL: https://issues.apache.org/jira/browse/SPARK-29068
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Thomas Diesler
>Priority: Major
>
> Reading the [SFNY example 
> data|https://github.com/jadeyee/r2d3-part-1-data/blob/master/part_1_data.csv] 
> in Java like this ...
> {code:java}
> Path srcdir = Paths.get("src/test/resources");
> Path inpath = srcdir.resolve("part_1_data.csv");
> SparkSession session = getOrCreateSession();
> Dataset dataset = session.read()
>   //.option("header", true)
> .option("mode", "DROPMALFORMED")
>   .schema(new StructType()
>   .add("insf", IntegerType, false)
>   .add("beds", DoubleType, false)
>   .add("baths", DoubleType, false)
>   .add("price", IntegerType, false)
>   .add("year", IntegerType, false)
>   .add("sqft", IntegerType, false)
>   .add("prcsqft", IntegerType, false)
>   .add("elevation", IntegerType, false))
>   .csv(inpath.toString());
> {code}
> Incorrectly reports 495 instead of 492 rows. It seems to include the three 
> header rows in the count.
> Also, without DROPMALFORMED it creates 495 rows with three null value rows. 
> This also seems to be incorrect because the schema explicitly requires non 
> null values for all fields.
> This code works fine with Spark-2.1.0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24077) Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`

2019-07-07 Thread HondaWei (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879858#comment-16879858
 ] 

HondaWei commented on SPARK-24077:
--

Hi [~hyukjin.kwon]

Thank you! I am going to trace the code and modify it in the near term if 
[~benedict jin] doesn't work on it.

 

> Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`
> --
>
> Key: SPARK-24077
> URL: https://issues.apache.org/jira/browse/SPARK-24077
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Benedict Jin
>Priority: Major
>  Labels: starter
>
> The error message of {{CREATE TEMPORARY FUNCTION IF NOT EXISTS}} looks 
> confusing: 
> {code}
> scala> 
> org.apache.spark.sql.SparkSession.builder().enableHiveSupport.getOrCreate.sql("CREATE
>  TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as 
> 'org.apache.spark.sql.hive.udf.YuZhouWan'")
> {code}
> {code}
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'NOT' expecting \{'.', 'AS'}(line 1, pos 29)
> == SQL ==
>  CREATE TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as 
> 'org.apache.spark.sql.hive.udf.YuZhouWan'
>  -^^^
>  at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>  ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24077) Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`

2019-07-06 Thread HondaWei (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879615#comment-16879615
 ] 

HondaWei commented on SPARK-24077:
--

I would like to contribute Spark. May someone assign this issue to me for my 
first contribution in Spark. Thanks:)

> Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`
> --
>
> Key: SPARK-24077
> URL: https://issues.apache.org/jira/browse/SPARK-24077
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Benedict Jin
>Priority: Major
>  Labels: starter
>
> The error message of {{CREATE TEMPORARY FUNCTION IF NOT EXISTS}} looks 
> confusing: 
> {code}
> scala> 
> org.apache.spark.sql.SparkSession.builder().enableHiveSupport.getOrCreate.sql("CREATE
>  TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as 
> 'org.apache.spark.sql.hive.udf.YuZhouWan'")
> {code}
> {code}
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'NOT' expecting \{'.', 'AS'}(line 1, pos 29)
> == SQL ==
>  CREATE TEMPORARY FUNCTION IF NOT EXISTS yuzhouwan as 
> 'org.apache.spark.sql.hive.udf.YuZhouWan'
>  -^^^
>  at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>  ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org