1.Maybe we can't use customized group id in structured streaming.
2.When restarting from failure or killing , the group id changes, but the
starting offset will be the last one you consumed last time .
Srinivas V 于2020年3月19日周四 下午12:36写道:
> Hello,
> 1. My Kafka consumer name is randomly being
Hello,
1. My Kafka consumer name is randomly being generated by spark structured
streaming. Can I override this?
2. When testing in development, when I stop my streaming job for Kafka
consumer job for couple of days and try to start back again, the job keeps
failing for missing offsets as the
I have found many library incompatibility issues including JVM headless
issues where I had to uninstall headless jvm and install jdk
and work through them, anyway
This page shows the same error as yours,
you may get away with making the changes to your pom.xml as suggested.
Hi,
I am finding difficulty in getting the proper Kafka lib's for spark. The
version of HDP is 3.1 and i tried the below lib's but it produces the below
issues.
*POM entry :*
org.apache.kafka
kafka-clients
2.0.0.3.1.0.0-78
org.apache.kafka
kafka_2.11
2.0.0.3.1.0.0-78
Hi,
I am getting "Py4JJavaError: An error occurred while calling o545.save"
error while executing below code.
myDF = spark.read.format("csv")\
.option("header",True)\
.option("mode","FAILFAST")\
.schema(myManualSchema)\
.load("C:\\Arnab\\Spark\\data\\2015-summary.csv")
Thanks for suggestion Netanel,
Sorry for less information, I am specifically looking for something inside
Hadoop ecosystem.
-
Manjunath
From: Netanel Malka
Sent: Wednesday, March 18, 2020 5:26 PM
To: Manjunath Shetty H
Subject: Re: Saving Spark run stats and
I noticed that Spark handles CAST operation in fail-safe manner. i.e. if the
casting operation will fail for some record(s), Spark doesn't fail the
entire query instead it returns null data value for those failures.
For example following query:
Looking at the code it seems that this behavior
Hi All,
Want to save each spark batch run stats (start, end, ID etc) and watermark (
Last processed timestamp from external data source).
We have tried Hive JDBC, but it is very slow due MR jobs it will trigger. Cant
save to normal Hive tables as it will create lots of small files in HDFS.
Hi
I am running spark on kubernetes. Spark version is 2.4.5.
While submitting a spark job on kubernetes I get the below messages:
io.fabric8.kubernetes.client.Config : Error reading service account
token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.