Hi
Using Spark 2.1.1.2.6-1.0-129 (from Hortonworks distro) and Scala 2.11.8
and Java 1.8.0_60
I have Nifi flow produces more records than Spark stream can work in
batch time. To avoid spark queue overflow I wanted to try spark
streaming backpressure (did not work for my) so back to the more simple
but static solution I tried spark.streaming.receiver.maxRate.
I set it spark.streaming.receiver.maxRate=1. As I understand it from
Spark manual: "If the batch processing time is more than batchinterval
then obviously the receiver’s memory will start filling up and will end
up in throwing exceptions (most probably BlockNotFoundException).
Currently there is no way to pause the receiver. Using SparkConf
configuration|spark.streaming.receiver.maxRate|, rate of receiver can be
limited." - it means 1 record per second?
I have very simple code:
val conf =new
SiteToSiteClient.Builder().url("http://192.168.80.120:9090/nifi").portName("testing").buildConfig()
val ssc =new StreamingContext(sc, Seconds(1))
val lines = ssc.receiverStream(new NiFiReceiver(conf,
StorageLevel.MEMORY_AND_DISK))
lines.print()
ssc.start()
I have loads of records waiting in Nifi testing port. After I start
ssc.start() I will get 9-7 records per batch (batch time is 1s). Do I
understand spark.streaming.receiver.maxRate wrong?
--
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
https://www.facebook.com/allan.tuuring
+372 51 48 780