[jira] [Updated] (SPARK-39900) Querying dataframe produced by 'binaryFile' format using 'not' operator

2022-07-27 Thread Benoit Roy (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Roy updated SPARK-39900:
---
Summary: Querying dataframe produced by 'binaryFile' format using 'not' 
operator  (was: Incorrect result when query dataframe produced by 'binaryFile' 
format)

> Querying dataframe produced by 'binaryFile' format using 'not' operator
> ---
>
> Key: SPARK-39900
> URL: https://issues.apache.org/jira/browse/SPARK-39900
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Benoit Roy
>Priority: Minor
>
> When creating a dataframe using the binaryFile format I am encountering weird 
> result when filtering/query with the 'not' operator.
>  
> Here's a repo that will help describe and reproduce the issue.
> [https://github.com/cccs-br/spark-binaryfile-issue]
> {code:java}
> g...@github.com:cccs-br/spark-binaryfile-issue.git {code}
>  
> Here's a very simple test case that illustrate what's going on:
> [https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]
> {code:java}
>test("binary file dataframe") {
> // load files in directly into df using 'binaryFile' format.
> val df = spark
>   .read
>   .format("binaryFile")
>   .load("src/test/resources/files")
> df.createOrReplaceTempView("files")
> // This works as expected.
> val like_count = spark.sql("select * from files where path like 
> '%.csv'").count()
> assert(like_count === 1)
> // This does not work as expected.
> val not_like_count = spark.sql("select * from files where path not like 
> '%.csv'").count()
> assert(not_like_count === 2)
> // This used to work in 3.2.1
> // df.filter(col("path").endsWith(".csv") === false).show()
>   }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39900) Querying dataframe produced by 'binaryFile' format using 'not' operator

2022-07-27 Thread Benoit Roy (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Roy updated SPARK-39900:
---
Description: 
When creating a dataframe using the binaryFile format I am encountering weird 
result when filtering/query with the 'not' operator.

 

Here's a repo that will help describe and reproduce the issue.

[https://github.com/cccs-br/spark-binaryfile-issue]
{code:java}
g...@github.com:cccs-br/spark-binaryfile-issue.git {code}
 

Here's a very simple test case that illustrate what's going on:

[https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]

TLDR;
{code:java}
   test("binary file dataframe") {
// load files in directly into df using 'binaryFile' format.
    //     
// - src/test/resources/files/
    //  - test1.csv
    //  - test2.json
    //  - test3.txt
val df = spark
  .read
  .format("binaryFile")
  .load("src/test/resources/files")

df.createOrReplaceTempView("files")

// This works as expected.
val like_count = spark.sql("select * from files where path like 
'%.csv'").count()
assert(like_count === 1)

// This does not work as expected.
val not_like_count = spark.sql("select * from files where path not like 
'%.csv'").count()
assert(not_like_count === 2)

// This used to work in 3.2.1
// df.filter(col("path").endsWith(".csv") === false).show()
  }{code}

  was:
When creating a dataframe using the binaryFile format I am encountering weird 
result when filtering/query with the 'not' operator.

 

Here's a repo that will help describe and reproduce the issue.

[https://github.com/cccs-br/spark-binaryfile-issue]
{code:java}
g...@github.com:cccs-br/spark-binaryfile-issue.git {code}
 

Here's a very simple test case that illustrate what's going on:

[https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]
{code:java}
   test("binary file dataframe") {
// load files in directly into df using 'binaryFile' format.
    //     
// - src/test/resources/files/
    //  - test1.csv
    //  - test2.json
    //  - test3.txt
val df = spark
  .read
  .format("binaryFile")
  .load("src/test/resources/files")

df.createOrReplaceTempView("files")

// This works as expected.
val like_count = spark.sql("select * from files where path like 
'%.csv'").count()
assert(like_count === 1)

// This does not work as expected.
val not_like_count = spark.sql("select * from files where path not like 
'%.csv'").count()
assert(not_like_count === 2)

// This used to work in 3.2.1
// df.filter(col("path").endsWith(".csv") === false).show()
  }{code}


> Querying dataframe produced by 'binaryFile' format using 'not' operator
> ---
>
> Key: SPARK-39900
> URL: https://issues.apache.org/jira/browse/SPARK-39900
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Benoit Roy
>Priority: Minor
>
> When creating a dataframe using the binaryFile format I am encountering weird 
> result when filtering/query with the 'not' operator.
>  
> Here's a repo that will help describe and reproduce the issue.
> [https://github.com/cccs-br/spark-binaryfile-issue]
> {code:java}
> g...@github.com:cccs-br/spark-binaryfile-issue.git {code}
>  
> Here's a very simple test case that illustrate what's going on:
> [https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]
> TLDR;
> {code:java}
>test("binary file dataframe") {
> // load files in directly into df using 'binaryFile' format.
>     //     
> // - src/test/resources/files/
>     //  - test1.csv
>     //  - test2.json
>     //  - test3.txt
> val df = spark
>   .read
>   .format("binaryFile")
>   .load("src/test/resources/files")
> df.createOrReplaceTempView("files")
> // This works as expected.
> val like_count = spark.sql("select * from files where path like 
> '%.csv'").count()
> assert(like_count === 1)
> // This does not work as expected.
> val not_like_count = spark.sql("select * from files where path not like 
> '%.csv'").count()
> assert(not_like_count === 2)
> // This used to work in 3.2.1
> // df.filter(col("path").endsWith(".csv") === false).show()
>   }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39900) Querying dataframe produced by 'binaryFile' format using 'not' operator

2022-07-27 Thread Benoit Roy (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Roy updated SPARK-39900:
---
Description: 
When creating a dataframe using the binaryFile format I am encountering weird 
result when filtering/query with the 'not' operator.

 

Here's a repo that will help describe and reproduce the issue.

[https://github.com/cccs-br/spark-binaryfile-issue]
{code:java}
g...@github.com:cccs-br/spark-binaryfile-issue.git {code}
 

Here's a very simple test case that illustrate what's going on:

[https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]
{code:java}
   test("binary file dataframe") {
// load files in directly into df using 'binaryFile' format.
    //     
// - src/test/resources/files/
    //  - test1.csv
    //  - test2.json
    //  - test3.txt
val df = spark
  .read
  .format("binaryFile")
  .load("src/test/resources/files")

df.createOrReplaceTempView("files")

// This works as expected.
val like_count = spark.sql("select * from files where path like 
'%.csv'").count()
assert(like_count === 1)

// This does not work as expected.
val not_like_count = spark.sql("select * from files where path not like 
'%.csv'").count()
assert(not_like_count === 2)

// This used to work in 3.2.1
// df.filter(col("path").endsWith(".csv") === false).show()
  }{code}

  was:
When creating a dataframe using the binaryFile format I am encountering weird 
result when filtering/query with the 'not' operator.

 

Here's a repo that will help describe and reproduce the issue.

[https://github.com/cccs-br/spark-binaryfile-issue]
{code:java}
g...@github.com:cccs-br/spark-binaryfile-issue.git {code}
 

Here's a very simple test case that illustrate what's going on:

[https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]
{code:java}
   test("binary file dataframe") {
// load files in directly into df using 'binaryFile' format.
val df = spark
  .read
  .format("binaryFile")
  .load("src/test/resources/files")

df.createOrReplaceTempView("files")



// This works as expected.
val like_count = spark.sql("select * from files where path like 
'%.csv'").count()
assert(like_count === 1)

// This does not work as expected.
val not_like_count = spark.sql("select * from files where path not like 
'%.csv'").count()
assert(not_like_count === 2)

// This used to work in 3.2.1
// df.filter(col("path").endsWith(".csv") === false).show()
  }{code}


> Querying dataframe produced by 'binaryFile' format using 'not' operator
> ---
>
> Key: SPARK-39900
> URL: https://issues.apache.org/jira/browse/SPARK-39900
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Benoit Roy
>Priority: Minor
>
> When creating a dataframe using the binaryFile format I am encountering weird 
> result when filtering/query with the 'not' operator.
>  
> Here's a repo that will help describe and reproduce the issue.
> [https://github.com/cccs-br/spark-binaryfile-issue]
> {code:java}
> g...@github.com:cccs-br/spark-binaryfile-issue.git {code}
>  
> Here's a very simple test case that illustrate what's going on:
> [https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]
> {code:java}
>test("binary file dataframe") {
> // load files in directly into df using 'binaryFile' format.
>     //     
> // - src/test/resources/files/
>     //  - test1.csv
>     //  - test2.json
>     //  - test3.txt
> val df = spark
>   .read
>   .format("binaryFile")
>   .load("src/test/resources/files")
> df.createOrReplaceTempView("files")
> // This works as expected.
> val like_count = spark.sql("select * from files where path like 
> '%.csv'").count()
> assert(like_count === 1)
> // This does not work as expected.
> val not_like_count = spark.sql("select * from files where path not like 
> '%.csv'").count()
> assert(not_like_count === 2)
> // This used to work in 3.2.1
> // df.filter(col("path").endsWith(".csv") === false).show()
>   }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org