Re: structured streaming Kafka consumer group.id override

2020-03-18 Thread lec ssmi
1.Maybe we can't use customized group id in structured streaming. 2.When restarting from failure or killing , the group id changes, but the starting offset will be the last one you consumed last time . Srinivas V 于2020年3月19日周四 下午12:36写道: > Hello, > 1. My Kafka consumer name is randomly being

structured streaming Kafka consumer group.id override

2020-03-18 Thread Srinivas V
Hello, 1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this? 2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the

Re: HDP 3.1 spark Kafka dependency

2020-03-18 Thread Zahid Rahman
I have found many library incompatibility issues including JVM headless issues where I had to uninstall headless jvm and install jdk and work through them, anyway This page shows the same error as yours, you may get away with making the changes to your pom.xml as suggested.

HDP 3.1 spark Kafka dependency

2020-03-18 Thread William R
Hi, I am finding difficulty in getting the proper Kafka lib's for spark. The version of HDP is 3.1 and i tried the below lib's but it produces the below issues. *POM entry :* org.apache.kafka kafka-clients 2.0.0.3.1.0.0-78 org.apache.kafka kafka_2.11 2.0.0.3.1.0.0-78

Fwd: [ Write JSON ] - An error occurred while calling o545.save

2020-03-18 Thread Sanyal Arnab
Hi, I am getting "Py4JJavaError: An error occurred while calling o545.save" error while executing below code. myDF = spark.read.format("csv")\ .option("header",True)\ .option("mode","FAILFAST")\ .schema(myManualSchema)\ .load("C:\\Arnab\\Spark\\data\\2015-summary.csv")

Re: Saving Spark run stats and run watermark

2020-03-18 Thread Manjunath Shetty H
Thanks for suggestion Netanel, Sorry for less information, I am specifically looking for something inside Hadoop ecosystem. - Manjunath From: Netanel Malka Sent: Wednesday, March 18, 2020 5:26 PM To: Manjunath Shetty H Subject: Re: Saving Spark run stats and

Reasoning behind fail safe behaviour of cast expression

2020-03-18 Thread vatsal
I noticed that Spark handles CAST operation in fail-safe manner. i.e. if the casting operation will fail for some record(s), Spark doesn't fail the entire query instead it returns null data value for those failures. For example following query: Looking at the code it seems that this behavior

Saving Spark run stats and run watermark

2020-03-18 Thread Manjunath Shetty H
Hi All, Want to save each spark batch run stats (start, end, ID etc) and watermark ( Last processed timestamp from external data source). We have tried Hive JDBC, but it is very slow due MR jobs it will trigger. Cant save to normal Hive tables as it will create lots of small files in HDFS.

[Spark kubernetes] Getting message - The reason is: Keystore was tampered with, or password was incorrect"

2020-03-18 Thread manishgupta88
Hi I am running spark on kubernetes. Spark version is 2.4.5. While submitting a spark job on kubernetes I get the below messages: io.fabric8.kubernetes.client.Config : Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.