[ 
https://issues.apache.org/jira/browse/SPARK-39993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581492#comment-17581492
 ] 

Shrikant Prasad commented on SPARK-39993:
-----------------------------------------

[~h.liashchuk] The code snippet you have shared is working fine in cluster 
deploy mode if we write to hdfs instead of s3. So, don't think there is any 
issue with k8s master. You might also first check what's the output of 
df.show() to see if the df contains the expected rows or not.

> Spark on Kubernetes doesn't filter data by date
> -----------------------------------------------
>
>                 Key: SPARK-39993
>                 URL: https://issues.apache.org/jira/browse/SPARK-39993
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.2
>         Environment: Kubernetes v1.23.6
> Spark 3.2.2
> Java 1.8.0_312
> Python 3.9.13
> Aws dependencies:
> aws-java-sdk-bundle-1.11.901.jar and hadoop-aws-3.3.1.jar
>            Reporter: Hanna Liashchuk
>            Priority: Major
>              Labels: kubernetes
>
> I'm creating a Dataset with type date and saving it into s3. When I read it 
> and try to use where() clause, I've noticed it doesn't return data even 
> though it's there
> Below is the code snippet I'm running
>  
> {code:java}
> from pyspark.sql.types import Row
> from pyspark.sql.functions import *
> ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", 
> col("date").cast("date"))
> ds.where("date = '2022-01-01'").show()
> ds.write.mode("overwrite").parquet("s3a://bucket/test")
> df = spark.read.format("parquet").load("s3a://bucket/test")
> df.where("date = '2022-01-01'").show()
> {code}
> The first show() returns data, while the second one - no.
> I've noticed that it's Kubernetes master related, as the same code snipped 
> works ok with master "local"
> UPD: if the column is used as a partition and has the type "date" there is no 
> filtering problem.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to