[ https://issues.apache.org/jira/browse/SPARK-30530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017290#comment-17017290 ]
Maxim Gekk commented on SPARK-30530: ------------------------------------ [~jlowe] Thank you for the bug report. I will take a look at it. > CSV load followed by "is null" filter produces incorrect results > ---------------------------------------------------------------- > > Key: SPARK-30530 > URL: https://issues.apache.org/jira/browse/SPARK-30530 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Jason Darrell Lowe > Priority: Major > > Trying to filter on is null from values loaded from a CSV file has regressed > recently and now produces incorrect results. > Given a CSV file with the contents: > {noformat:title=floats.csv} > 100.0,1.0, > 200.0,, > 300.0,3.0, > 1.0,4.0, > ,4.0, > 500.0,, > ,6.0, > -500.0,50.5 > {noformat} > Filtering this data for the first column being null should return exactly two > rows, but it is returning extraneous rows with nulls: > {noformat} > scala> val schema = StructType(Array(StructField("floats", FloatType, > true),StructField("more_floats", FloatType, true))) > schema: org.apache.spark.sql.types.StructType = > StructType(StructField(floats,FloatType,true), > StructField(more_floats,FloatType,true)) > scala> val df = spark.read.schema(schema).csv("floats.csv") > df: org.apache.spark.sql.DataFrame = [floats: float, more_floats: float] > scala> df.filter("floats is null").show > +------+-----------+ > |floats|more_floats| > +------+-----------+ > | null| null| > | null| null| > | null| null| > | null| null| > | null| 4.0| > | null| null| > | null| 6.0| > +------+-----------+ > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org