[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591834#comment-16591834 ]
Alexis Seigneurin commented on SPARK-25106: ------------------------------------------- I just built the code from [https://github.com/apache/spark/releases/tag/v2.3.2-rc5] and the issue seems to be gone 👍 > A new Kafka consumer gets created for every batch > ------------------------------------------------- > > Key: SPARK-25106 > URL: https://issues.apache.org/jira/browse/SPARK-25106 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.3.1 > Reporter: Alexis Seigneurin > Priority: Major > Attachments: console.txt > > > I have a fairly simple piece of code that reads from Kafka, applies some > transformations - including applying a UDF - and writes the result to the > console. Every time a batch is created, a new consumer is created (and not > closed), eventually leading to a "too many open files" error. > I created a test case, with the code available here: > [https://github.com/aseigneurin/spark-kafka-issue] > To reproduce: > # Start Kafka and create a topic called "persons" > # Run "Producer" to generate data > # Run "Consumer" > I am attaching the log where you can see a new consumer being initialized > between every batch. > Please note this issue does *not* appear with Spark 2.2.2, and it does not > appear either when I don't apply the UDF. > I am suspecting - although I did go far enough to confirm - that this issue > is related to the improvement made in SPARK-23623. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org