[ https://issues.apache.org/jira/browse/SPARK-39993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690342#comment-17690342 ]
Hanna Liashchuk commented on SPARK-39993: ----------------------------------------- Any update here? [~unamesk15] > Spark on Kubernetes doesn't filter data by date > ----------------------------------------------- > > Key: SPARK-39993 > URL: https://issues.apache.org/jira/browse/SPARK-39993 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.2.2 > Environment: Kubernetes v1.23.6 > Spark 3.2.2 > Java 1.8.0_312 > Python 3.9.13 > Aws dependencies: > aws-java-sdk-bundle-1.11.901.jar and hadoop-aws-3.3.1.jar > Reporter: Hanna Liashchuk > Priority: Major > Labels: kubernetes > > I'm creating a Dataset with type date and saving it into s3. When I read it > and try to use where() clause, I've noticed it doesn't return data even > though it's there > Below is the code snippet I'm running > > {code:java} > from pyspark.sql.types import Row > from pyspark.sql.functions import * > ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", > col("date").cast("date")) > ds.where("date = '2022-01-01'").show() > ds.write.mode("overwrite").parquet("s3a://bucket/test") > df = spark.read.format("parquet").load("s3a://bucket/test") > df.where("date = '2022-01-01'").show() > {code} > The first show() returns data, while the second one - no. > I've noticed that it's Kubernetes master related, as the same code snipped > works ok with master "local" > UPD: if the column is used as a partition and has the type "date" there is no > filtering problem. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org