Hi Jorn,
Just want to check if you got a chance to look at this problem. I couldn't
figure out any reason on why this is happening. Any help would be
appreciated.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Hi Jorn,
Thanks for your kind reply. I do accept that there might be something in the
code. Any help would be appreciated.
To give you some insights, I checked the source of the message in kafka if
it has been repeated twice. But, I could only find it once. Also, it would
have been convincing
Hi All,
My problem is as explained,
Environment: Spark 2.2.0 installed on CDH
Use-Case: Reading from Kafka, cleansing the data and ingesting into a non
updatable database.
Problem: My streaming batch duration is 1 minute and I am receiving 3000
messages/min. I am observing a weird case where,
Hi All,
I configured the number of task failures using spark.task.maxFailures as 10
in my spark application which ingests data into Cassandra reading from
Kafka. I observed that when Cassandra service is down, it is not retrying
for the property I set i.e. 10. Instead it is retrying with the
Hi Community,
Seeing the below message in the logs and Spark application is getting
terminated. There is an issue with our Kafka service and it auto restarts
during which leader reelection happens.
*Exception:*
assertion failed: Beginning offset 34242088 is after the ending offset
34242084 for
Jorn,
Thanks for the response. My downstream database is Kudu.
1. Yes. As you have suggested, I have been using a central caching mechanism
that caches the rdd results and to make a comparison with the next batch to
check for the latest timestamps and ignore the old timestamps. But, I see
Hi All,
I am using Spark 2.2.0 & I have below use case:
*Reading from Kafka using Spark Streaming and updating(not just inserting)
the records into downstream database*
I understand that the way Spark read messages from Kafka will not be in a
order of timestamp as stored in Kafka partitions
Hi All,
Is there any property which makes my spark streaming application a single
threaded?
I researched on this property, *spark.dynamicAllocation.maxExecutors=1*, but
as far as I understand this launches a maximum of one container but not a
single thread. In local mode, we can configure the
Hi All,
I have the below problem in Spark Kafka steaming.
Environment:
Spark-2.2.0
Problem:
We have written our own logic for offset management in zookeeper when
streaming data with Spark + Kafka. Everything is working fine and we are
able to control the offset commitment to zookeeper during
*Environment:*
Spark 2.2.0
*Kafka:* 0.10.0
*Language:* Java
*UseCase:* Streaming data from Kafka using JavaDStreams and storing into a
downstream database.
*Issue:*
I have a use case, where in I have to launch a thread in the background that
would connect to a DB and Cache the retrieved
Hi All,
I am trying to read data from Kafka and ingest into Kudu using Spark
Streaming. I am not using KuduContext to perform the upsert operation into
kudu. Instead using Kudus native Client API to build the PartialRow and
applying the operation for every record from Kafka. I am able to run the
11 matches
Mail list logo