[ https://issues.apache.org/jira/browse/SPARK-39900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574288#comment-17574288 ]
Benoit Roy commented on SPARK-39900: ------------------------------------ This is much appreciated! ;) > Issue with querying dataframe produced by 'binaryFile' format using 'not' > operator > ---------------------------------------------------------------------------------- > > Key: SPARK-39900 > URL: https://issues.apache.org/jira/browse/SPARK-39900 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1, 3.3.0 > Reporter: Benoit Roy > Priority: Minor > > When creating a dataframe using the binaryFile format I am encountering weird > result when filtering/query with the 'not' operator. > > Here's a repo that will help describe and reproduce the issue. > [https://github.com/cccs-br/spark-binaryfile-issue] > {code:java} > g...@github.com:cccs-br/spark-binaryfile-issue.git {code} > > Here's a very simple test case that illustrate what's going on: > [https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala] > TLDR; > {code:java} > test("binary file dataframe") { > // load files in directly into df using 'binaryFile' format. > // > // - src/test/resources/files/ > // - test1.csv > // - test2.json > // - test3.txt > val df = spark > .read > .format("binaryFile") > .load("src/test/resources/files") > df.createOrReplaceTempView("files") > // This works as expected. > val like_count = spark.sql("select * from files where path like > '%.csv'").count() > assert(like_count === 1) > // This does not work as expected. > val not_like_count = spark.sql("select * from files where path not like > '%.csv'").count() > assert(not_like_count === 2) > // This used to work in 3.2.1 > // df.filter(col("path").endsWith(".csv") === false).show() > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org