Kafka Sink Issue

2021-08-23 Thread Amit Sharma
Hi, I am using EMR-5.33.0 (Spark version- Spark 2.4.7). I am writing job which is reading from one kafka topic to other kafka topic. In kafka sink we are using checkpointing also but facing below issues while running job 021-08-23 17:01:13.373 MicroBatchExecution stream execution thread for

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Gourav Sengupta
Hi, the query still gives the same error if we write "SELECT * FROM table_name WHERE data_partition > CURRENT_DATE() - INTERVAL 10 DAYS". Also the queries work fine in SPARK 3.0.x, or in EMR 6.2.0. Thanks and Regards, Gourav Sengupta On Mon, Aug 23, 2021 at 1:16 PM Sean Owen wrote: > Date

Re: Java : Testing RDD aggregateByKey

2021-08-23 Thread Pedro Tuero
Same here, repartition(0) throws IllegalArgument (What I would have expected for ) , but aggregateByKey(zeroValue, 0, seqFunc, combFunc) is not throwing any exception nor logging any error message. The only consequence is an empty RDD. El sáb, 21 de ago. de 2021 a la(s) 07:45, Jacek Laskowski

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Sean Owen
Date handling was tightened up in Spark 3. I think you need to compare to a date literal, not a string literal. On Mon, Aug 23, 2021 at 5:12 AM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Hi, > > while I am running in EMR 6.3.0 (SPARK 3.1.1) a simple query as "SELECT * > FROM

AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Gourav Sengupta
Hi, while I am running in EMR 6.3.0 (SPARK 3.1.1) a simple query as "SELECT * FROM WHERE > '2021-03-01'" the query is failing with error: --- pyspark.sql.utils.AnalysisException: