Re: Streaming Backpressure with Multiple Streams
So as you were maybe thinking, it only happens with the combination: Direct Stream only + backpressure = works as expected 4x Receiver on Topic A + Direct Stream on Topic B + backpressure = the direct stream is throttled even in the absence of scheduling delay This is using Spark 1.5.0 on CDH. After it's been running for several minutes if I look at "Input Metadata" I can see that the direct stream is consuming 1 record / partition / sec. I have maxrate set at 10,000 records / partition / sec. I'll file a bug today unless someone has any ideas? Thanks! Jeff On Fri, Sep 9, 2016 at 5:54 PM, Jeff Nadlerwrote: > Yes I'll test that next. > > On Sep 9, 2016 5:36 PM, "Cody Koeninger" wrote: > >> Does the same thing happen if you're only using direct stream plus back >> pressure, not the receiver stream? >> >> On Sep 9, 2016 6:41 PM, "Jeff Nadler" wrote: >> >>> Maybe this is a pretty esoteric implementation, but I'm seeing some bad >>> behavior with backpressure plus multiple Kafka streams / direct streams. >>> >>> Here's the scenario: >>> We have 1 Kafka topic using the reliable receiver (4 receivers, union >>> the result).In the same app, we consume another Kafka topic using a >>> direct stream. >>> >>> This may seem strange, but it's necessary in my application to work >>> around another problem: Maxrate is set globally in SparkConf.IMO It >>> would be more flexible if we could set maxrate for each stream >>> independently. Since directstream uses a different config parameter for >>> maxrate, we get the desired result. >>> >>> A bit hacky I know. >>> >>> Anyway, we recently turned on backpressure. It works as expected for >>> the receiver-based stream. For the direct stream, it starts out at the >>> maxrate (as expected) on the first batch.Then it ratchets down the >>> consumption until it is eventually consuming 1 record / second / partition. >>> >>> This happens even though there's no scheduling delay, and the >>> receiver-based stream does not appear to be throttled. >>> >>> Anyone ever see anything like this? >>> >>> Thanks! >>> >>> Jeff Nadler >>> Aerohive Networks >>> >>>
Re: Streaming Backpressure with Multiple Streams
Yes I'll test that next. On Sep 9, 2016 5:36 PM, "Cody Koeninger"wrote: > Does the same thing happen if you're only using direct stream plus back > pressure, not the receiver stream? > > On Sep 9, 2016 6:41 PM, "Jeff Nadler" wrote: > >> Maybe this is a pretty esoteric implementation, but I'm seeing some bad >> behavior with backpressure plus multiple Kafka streams / direct streams. >> >> Here's the scenario: >> We have 1 Kafka topic using the reliable receiver (4 receivers, union the >> result).In the same app, we consume another Kafka topic using a direct >> stream. >> >> This may seem strange, but it's necessary in my application to work >> around another problem: Maxrate is set globally in SparkConf.IMO It >> would be more flexible if we could set maxrate for each stream >> independently. Since directstream uses a different config parameter for >> maxrate, we get the desired result. >> >> A bit hacky I know. >> >> Anyway, we recently turned on backpressure. It works as expected for >> the receiver-based stream. For the direct stream, it starts out at the >> maxrate (as expected) on the first batch.Then it ratchets down the >> consumption until it is eventually consuming 1 record / second / partition. >> >> This happens even though there's no scheduling delay, and the >> receiver-based stream does not appear to be throttled. >> >> Anyone ever see anything like this? >> >> Thanks! >> >> Jeff Nadler >> Aerohive Networks >> >>
Re: Streaming Backpressure with Multiple Streams
Does the same thing happen if you're only using direct stream plus back pressure, not the receiver stream? On Sep 9, 2016 6:41 PM, "Jeff Nadler"wrote: > Maybe this is a pretty esoteric implementation, but I'm seeing some bad > behavior with backpressure plus multiple Kafka streams / direct streams. > > Here's the scenario: > We have 1 Kafka topic using the reliable receiver (4 receivers, union the > result).In the same app, we consume another Kafka topic using a direct > stream. > > This may seem strange, but it's necessary in my application to work around > another problem: Maxrate is set globally in SparkConf.IMO It would be > more flexible if we could set maxrate for each stream independently. > Since directstream uses a different config parameter for maxrate, we get > the desired result. > > A bit hacky I know. > > Anyway, we recently turned on backpressure. It works as expected for the > receiver-based stream. For the direct stream, it starts out at the > maxrate (as expected) on the first batch.Then it ratchets down the > consumption until it is eventually consuming 1 record / second / partition. > > This happens even though there's no scheduling delay, and the > receiver-based stream does not appear to be throttled. > > Anyone ever see anything like this? > > Thanks! > > Jeff Nadler > Aerohive Networks > >
Streaming Backpressure with Multiple Streams
Maybe this is a pretty esoteric implementation, but I'm seeing some bad behavior with backpressure plus multiple Kafka streams / direct streams. Here's the scenario: We have 1 Kafka topic using the reliable receiver (4 receivers, union the result).In the same app, we consume another Kafka topic using a direct stream. This may seem strange, but it's necessary in my application to work around another problem: Maxrate is set globally in SparkConf.IMO It would be more flexible if we could set maxrate for each stream independently. Since directstream uses a different config parameter for maxrate, we get the desired result. A bit hacky I know. Anyway, we recently turned on backpressure. It works as expected for the receiver-based stream. For the direct stream, it starts out at the maxrate (as expected) on the first batch.Then it ratchets down the consumption until it is eventually consuming 1 record / second / partition. This happens even though there's no scheduling delay, and the receiver-based stream does not appear to be throttled. Anyone ever see anything like this? Thanks! Jeff Nadler Aerohive Networks