[ 
https://issues.apache.org/jira/browse/SPARK-39993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577361#comment-17577361
 ] 

Hanna Liashchuk commented on SPARK-39993:
-----------------------------------------

hi [~hyukjin.kwon], this issue makes impossible usage of spark on k8s from SQL 
Servers like Kyuubi or dbt or usual usage from Jupyterhub, for example. Could 
you please take a quick look so we can rule out configuration issues, for 
example?

> Spark on Kubernetes doesn't filter data by date
> -----------------------------------------------
>
>                 Key: SPARK-39993
>                 URL: https://issues.apache.org/jira/browse/SPARK-39993
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.2
>         Environment: Kubernetes v1.23.6
> Spark 3.2.2
> Java 1.8.0_312
> Python 3.9.13
> Aws dependencies:
> aws-java-sdk-bundle-1.11.901.jar and hadoop-aws-3.3.1.jar
>            Reporter: Hanna Liashchuk
>            Priority: Major
>              Labels: kubernetes
>
> I'm creating a Dataset with type date and saving it into s3. When I read it 
> and try to use where() clause, I've noticed it doesn't return data even 
> though it's there
> Below is the code snippet I'm running
>  
> {code:java}
> from pyspark.sql.types import Row
> from pyspark.sql.functions import *
> ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", 
> col("date").cast("date"))
> ds.where("date = '2022-01-01'").show()
> ds.write.mode("overwrite").parquet("s3a://bucket/test")
> df = spark.read.format("parquet").load("s3a://bucket/test")
> df.where("date = '2022-01-01'").show()
> {code}
> The first show() returns data, while the second one - no.
> I've noticed that it's Kubernetes master related, as the same code snipped 
> works ok with master "local"
> UPD: if the column is used as a partition and has the type "date" or is de 
> facto date but has the type "string", there is no filtering problem.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to