[ 
https://issues.apache.org/jira/browse/SPARK-30745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harneet K updated SPARK-30745:
------------------------------
    Description: 
We have a spark streaming application reading data from Kafka.
 Data size: 15 Million

Below errors were seen:
 java.lang.AssertionError: assertion failed: Failed to get records for 
spark-executor- .... after polling for 512 at 
scala.Predef$.assert(Predef.scala:170)

There were more errors seen pertaining to CachedKafkaConsumer
 at 
org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
 at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
 at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
 
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
 at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
 at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
 at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926) 
 at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670) 
 at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
 at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
 at org.apache.spark.scheduler.Task.run(Task.scala:86)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 

The spark.streaming.kafka.consumer.poll.ms is set to default 512ms and other 
Kafka stream timeout settings are default.
 "request.timeout.ms" 
 "heartbeat.interval.ms" 
 "session.timeout.ms" 
 "max.poll.interval.ms" 

Also, the kafka is being recently updated to 0.10 from 0.8. In 0.8, this 
behavior was not seen. 
No resource issues are seen.

 

  was:
We have a spark streaming application reading data from kafka.
Data size: 15 Million



Below errors were seen:
java.lang.AssertionError: assertion failed: Failed to get records for 
spark-executor- .... after polling for 512 at 
scala.Predef$.assert(Predef.scala:170)

There were more errors seen pertaining to CachedKafkaConsumer
at 
org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
 
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926) 
at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670) 
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

 

The spark.streaming.kafka.consumer.poll.ms is set to default 512ms and other 
kafka stream timeout settings are default.
"request.timeout.ms" 
 "heartbeat.interval.ms" 
 "session.timeout.ms" 
 "max.poll.interval.ms" 

Also, the kafka is being recently updated to 0.10 from 0.8. In 0.8, this 
behavior was not seen. There is no resource issue seen.

 


> Spark streaming, kafka broker error, "Failed to get records for 
> spark-executor- .... after polling for 512"
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30745
>                 URL: https://issues.apache.org/jira/browse/SPARK-30745
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Deploy, DStreams, Kubernetes
>    Affects Versions: 2.0.2
>         Environment: Spark 2.0.2, Kafka 0.10
>            Reporter: Harneet K
>            Priority: Major
>              Labels: Spark2.0.2, cluster, kafka-0.10, spark-streaming-kafka
>
> We have a spark streaming application reading data from Kafka.
>  Data size: 15 Million
> Below errors were seen:
>  java.lang.AssertionError: assertion failed: Failed to get records for 
> spark-executor- .... after polling for 512 at 
> scala.Predef$.assert(Predef.scala:170)
> There were more errors seen pertaining to CachedKafkaConsumer
>  at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
>  at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
>  at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
>  
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926) 
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670) 
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
>  at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>  at org.apache.spark.scheduler.Task.run(Task.scala:86)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  
> The spark.streaming.kafka.consumer.poll.ms is set to default 512ms and other 
> Kafka stream timeout settings are default.
>  "request.timeout.ms" 
>  "heartbeat.interval.ms" 
>  "session.timeout.ms" 
>  "max.poll.interval.ms" 
> Also, the kafka is being recently updated to 0.10 from 0.8. In 0.8, this 
> behavior was not seen. 
> No resource issues are seen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to