There are existing tickets on the issues around kafka versions, e.g. https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten any committer weigh-in on direction.
On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori <oscarbat...@gmail.com> wrote: > Guys, > > To change the subject from meta-voting... > > We are doing Spark Streaming against a Kafka setup, everything is pretty > standard, and pretty current. In particular we are using Spark 2.1, and > Kafka 0.10.1, with batch windows that are quite large (5-10 minutes). The > problem we are having is pretty well described in the following excerpt from > the Spark documentation: > "For possible kafkaParams, see Kafka consumer config docs. If your Spark > batch duration is larger than the default Kafka heartbeat session timeout > (30 seconds), increase heartbeat.interval.ms and session.timeout.ms > appropriately. For batches larger than 5 minutes, this will require changing > group.max.session.timeout.ms on the broker. Note that the example sets > enable.auto.commit to false, for discussion see Storing Offsets below." > > In our case "group.max.session.timeout.ms" is set to default value, and our > processing time per batch easily exceeds that value. I did some further > hunting around and found the following SO post: > "KIP-62, decouples heartbeats from calls to poll() via a background > heartbeat thread. This, allow for a longer processing time (ie, time between > two consecutive poll()) than heartbeat interval." > > This pretty accurately describes our scenario: effectively our per batch > processing time is 2-6 minutes, well within the batch window, but in excess > of the max session timeout between polls, causing the consumer to be kicked > out of the group. > > Are there any plans to move the Kafka client up to 0.10.1 and make this > feature available to consumers? Or have I missed some helpful configuration > that would ameliorate this problem? I recognize changing > "group.max.session.timeout.ms" is one solution, though it seems doing > heartbeat checking outside of implicitly piggy backing on polling seems more > elegant. > > -Oscar > > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org